Overview

Dataset Statistics

Number of Variables 12
Number of Rows 25480
Missing Cells 0
Missing Cells (%) 0.0%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 14.1 MB
Average Row Size in Memory 581.9 B
Variable Types
  • Categorical: 9
  • Numerical: 3

Dataset Insights

no_of_employees is skewed Skewed
yr_of_estab is skewed Skewed
case_id has a high cardinality: 25480 distinct values High Cardinality
has_job_experience has constant length 1 Constant Length
requires_job_training has constant length 1 Constant Length
full_time_position has constant length 1 Constant Length
case_id has all distinct values Unique

Variables


case_id

categorical

Approximate Distinct Count 25480
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Memory Size 1874423

Length

Mean 8.5645
Standard Deviation 0.5829
Median 9
Minimum 6
Maximum 9

Sample

1st row EZYV01
2nd row EZYV02
3rd row EZYV03
4th row EZYV04
5th row EZYV05

Letter

Count 101920
Lowercase Letter 0
Space Separator 0
Uppercase Letter 101920
Dash Punctuation 0
Decimal Number 116303
  • case_id contains many words: 25480 words

continent

categorical

Approximate Distinct Count 6
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 1804558
  • The largest value (Asia) is over 4.52 times larger than the second largest value (Europe)

Length

Mean 5.8225
Standard Deviation 3.2546
Median 4
Minimum 4
Maximum 13

Sample

1st row Asia
2nd row Asia
3rd row Asia
4th row Asia
5th row Africa

Letter

Count 144214
Lowercase Letter 114590
Space Separator 4144
Uppercase Letter 29624
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Asia, Europe) take over 50.0%
  • The largest value (asia) is over 4.07 times larger than the second largest value (america)

education_of_employee

categorical

Approximate Distinct Count 4
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 1892960

Length

Mean 9.292
Standard Deviation 1.1097
Median 10
Minimum 8
Maximum 11

Sample

1st row High School
2nd row Master's
3rd row Bachelor's
4th row Bachelor's
5th row Master's

Letter

Count 213472
Lowercase Letter 184572
Space Separator 3420
Uppercase Letter 28900
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Bachelor's, Master's) take over 50.0%

has_job_experience

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 1681680

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row N
2nd row Y
3rd row N
4th row N
5th row Y

Letter

Count 25480
Lowercase Letter 0
Space Separator 0
Uppercase Letter 25480
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Y, N) take over 50.0%
  • has_job_experience has words of constant length

requires_job_training

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 1681680
  • The largest value (N) is over 7.62 times larger than the second largest value (Y)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row N
2nd row N
3rd row Y
4th row N
5th row N

Letter

Count 25480
Lowercase Letter 0
Space Separator 0
Uppercase Letter 25480
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (N, Y) take over 50.0%
  • The largest value (n) is over 7.62 times larger than the second largest value (y)
  • requires_job_training has words of constant length

no_of_employees

numerical

Approximate Distinct Count 7105
Approximate Unique (%) 27.9%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 407680
Mean 5667.0432
Minimum -26
Maximum 602069
Zeros 0
Zeros (%) 0.0%
Negatives 33
Negatives (%) 0.1%
  • no_of_employees is skewed right (γ1 = 12.2645)

Quantile Statistics

Minimum -26
5-th Percentile 209
Q1 1022
Median 2109
Q3 3504
95-th Percentile 14083
Maximum 602069
Range 602095
IQR 2482

Descriptive Statistics

Mean 5667.0432
Standard Deviation 22877.9288
Variance 5.234e+08
Sum 1.444e+08
Skewness 12.2645
Kurtosis 206.2943
Coefficient of Variation 4.037
  • no_of_employees is not normally distributed (p-value 4.330175315719114e-25)
  • no_of_employees has 1556 outliers

yr_of_estab

numerical

Approximate Distinct Count 199
Approximate Unique (%) 0.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 407680
Mean 1979.4099
Minimum 1800
Maximum 2016
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • yr_of_estab is skewed left (γ1 = -2.0372)

Quantile Statistics

Minimum 1800
5-th Percentile 1872
Q1 1976
Median 1997
Q3 2005
95-th Percentile 2012
Maximum 2016
Range 216
IQR 29

Descriptive Statistics

Mean 1979.4099
Standard Deviation 42.3669
Variance 1794.9567
Sum 5.0435e+07
Skewness -2.0372
Kurtosis 3.506
Coefficient of Variation 0.0214
  • yr_of_estab is not normally distributed (p-value 2.8433354669993744e-10)
  • yr_of_estab has 3260 outliers

region_of_employment

categorical

Approximate Distinct Count 5
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 1814783

Length

Mean 6.2238
Standard Deviation 1.9924
Median 5
Minimum 4
Maximum 9

Sample

1st row West
2nd row Northeast
3rd row West
4th row West
5th row South

Letter

Count 158583
Lowercase Letter 133103
Space Separator 0
Uppercase Letter 25480
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Northeast, South) take over 50.0%

prevailing_wage

numerical

Approximate Distinct Count 25454
Approximate Unique (%) 99.9%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 407680
Mean 74455.8146
Minimum 2.1367
Maximum 319210.27
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • prevailing_wage is skewed right (γ1 = 0.7557)

Quantile Statistics

Minimum 2.1367
5-th Percentile 444.7192
Q1 34015.48
Median 70308.21
Q3 107735.5125
95-th Percentile 162642.3175
Maximum 319210.27
Range 319208.1333
IQR 73720.0325

Descriptive Statistics

Mean 74455.8146
Standard Deviation 52815.9423
Variance 2.7895e+09
Sum 1.8971e+09
Skewness 0.7557
Kurtosis 0.8239
Coefficient of Variation 0.7094
  • prevailing_wage is not normally distributed (p-value 7.821546579739402e-06)
  • prevailing_wage has 427 outliers

unit_of_wage

categorical

Approximate Distinct Count 4
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 1758209
  • The largest value (Year) is over 10.65 times larger than the second largest value (Hour)

Length

Mean 4.0035
Standard Deviation 0.059
Median 4
Minimum 4
Maximum 5

Sample

1st row Hour
2nd row Year
3rd row Year
4th row Year
5th row Year

Letter

Count 102009
Lowercase Letter 76529
Space Separator 0
Uppercase Letter 25480
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Year, Hour) take over 50.0%
  • The largest value (year) is over 10.65 times larger than the second largest value (hour)

full_time_position

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 1681680
  • The largest value (Y) is over 8.41 times larger than the second largest value (N)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row Y
2nd row Y
3rd row Y
4th row Y
5th row Y

Letter

Count 25480
Lowercase Letter 0
Space Separator 0
Uppercase Letter 25480
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Y, N) take over 50.0%
  • The largest value (y) is over 8.41 times larger than the second largest value (n)
  • full_time_position has words of constant length

case_status

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 1860134
  • The largest value (Certified) is over 2.01 times larger than the second largest value (Denied)

Length

Mean 8.0037
Standard Deviation 1.4129
Median 9
Minimum 6
Maximum 9

Sample

1st row Denied
2nd row Certified
3rd row Denied
4th row Denied
5th row Certified

Letter

Count 203934
Lowercase Letter 178454
Space Separator 0
Uppercase Letter 25480
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Certified, Denied) take over 50.0%
  • The largest value (certified) is over 2.01 times larger than the second largest value (denied)

Interactions

Correlations

Missing Values